19 research outputs found
Gradient-free Policy Architecture Search and Adaptation
We develop a method for policy architecture search and adaptation via
gradient-free optimization which can learn to perform autonomous driving tasks.
By learning from both demonstration and environmental reward we develop a model
that can learn with relatively few early catastrophic failures. We first learn
an architecture of appropriate complexity to perceive aspects of world state
relevant to the expert demonstration, and then mitigate the effect of
domain-shift during deployment by adapting a policy demonstrated in a source
domain to rewards obtained in a target environment. We show that our approach
allows safer learning than baseline methods, offering a reduced cumulative
crash metric over the agent's lifetime as it learns to drive in a realistic
simulated environment.Comment: Accepted in Conference on Robot Learning, 201
Beyond Invariance: Test-Time Label-Shift Adaptation for Distributions with "Spurious" Correlations
Changes in the data distribution at test time can have deleterious effects on
the performance of predictive models . We consider situations where
there are additional meta-data labels (such as group labels), denoted by ,
that can account for such changes in the distribution. In particular, we assume
that the prior distribution , which models the dependence between the
class label and the "nuisance" factors , may change across domains,
either due to a change in the correlation between these terms, or a change in
one of their marginals. However, we assume that the generative model for
features is invariant across domains. We note that this corresponds
to an expanded version of the widely used "label shift" assumption, where the
labels now also include the nuisance factors . Based on this observation, we
propose a test-time label shift correction that adapts to changes in the joint
distribution using EM applied to unlabeled samples from the target
domain distribution, . Importantly, we are able to avoid fitting a
generative model , and merely need to reweight the outputs of a
discriminative model trained on the source distribution. We
evaluate our method, which we call "Test-Time Label-Shift Adaptation" (TTLSA),
on several standard image and text datasets, as well as the CheXpert chest
X-ray dataset, and show that it improves performance over methods that target
invariance to changes in the distribution, as well as baseline empirical risk
minimization methods. Code for reproducing experiments is available at
https://github.com/nalzok/test-time-label-shift .Comment: 24 pages, 7 figure
LANISTR: Multimodal Learning from Structured and Unstructured Data
Multimodal large-scale pretraining has shown impressive performance for
unstructured data including language, image, audio, and video. However, a
prevalent real-world scenario involves the combination of structured data types
(tabular, time-series) with unstructured data which has so far been
understudied. To bridge this gap, we propose LANISTR, an attention-based
framework to learn from LANguage, Image, and STRuctured data. The core of
LANISTR's methodology is rooted in \textit{masking-based} training applied
across both unimodal and multimodal levels. In particular, we introduce a new
similarity-based multimodal masking loss that enables it to learn cross-modal
relations from large-scale multimodal data with missing modalities. On two
real-world datastes, MIMIC-IV (healthcare) and Amazon Product Review (retail),
LANISTR demonstrates remarkable absolute improvements of 6.6\% (AUROC) and up
to 14\% (accuracy) when fine-tuned on 0.1\% and 0.01\% of labeled data,
respectively, compared to the state-of-the-art alternatives. Notably, these
improvements are observed even in the presence of considerable missingness
ratios of 35.7\% and 99.8\%, in the respective datasets
Generalized Zero- and Few-Shot Learning via Aligned Variational Autoencoders
Many approaches in generalized zero-shot learning rely on cross-modal mapping
between the image feature space and the class embedding space. As labeled
images are expensive, one direction is to augment the dataset by generating
either images or image features. However, the former misses fine-grained
details and the latter requires learning a mapping associated with class
embeddings. In this work, we take feature generation one step further and
propose a model where a shared latent space of image features and class
embeddings is learned by modality-specific aligned variational autoencoders.
This leaves us with the required discriminative information about the image and
classes in the latent features, on which we train a softmax classifier. The key
to our approach is that we align the distributions learned from images and from
side-information to construct latent features that contain the essential
multi-modal information associated with unseen classes. We evaluate our learned
latent features on several benchmark datasets, i.e. CUB, SUN, AWA1 and AWA2,
and establish a new state of the art on generalized zero-shot as well as on
few-shot learning. Moreover, our results on ImageNet with various zero-shot
splits show that our latent features generalize well in large-scale settings.Comment: Accepted at CVPR 201
ASPEST: Bridging the Gap Between Active Learning and Selective Prediction
Selective prediction aims to learn a reliable model that abstains from making
predictions when the model uncertainty is high. These predictions can then be
deferred to a human expert for further evaluation. In many real-world
scenarios, however, the distribution of test data is different from the
training data. This results in more inaccurate predictions, necessitating
increased human labeling, which is difficult and expensive in many scenarios.
Active learning circumvents this difficulty by only querying the most
informative examples and, in several cases, has been shown to lower the overall
labeling effort. In this work, we bridge the gap between selective prediction
and active learning, proposing a new learning paradigm called active selective
prediction which learns to query more informative samples from the shifted
target domain while increasing accuracy and coverage. For this new problem, we
propose a simple but effective solution, ASPEST, that trains ensembles of model
snapshots using self-training with their aggregated outputs as pseudo labels.
Extensive experiments on several image, text and structured datasets with
domain shifts demonstrate that active selective prediction can significantly
outperform prior work on selective prediction and active learning (e.g. on the
MNISTSVHN benchmark with the labeling budget of 100, ASPEST improves the
AUC metric from 79.36% to 88.84%) and achieves more optimal utilization of
humans in the loop